perm filename PATREC[4,KMC]2 blob
sn#078951 filedate 1973-12-22 generic text, type T, neo UTF8
00100 AN ALGORITHM WHICH CHARACTERIZES NATURAL LANGUAGE
00200 DIALOGUE EXPRESSIONS
00300
00400
00500
00600 COLBY AND PARKISON
00700
00800 OUTLINE
00900 INTRODUCTORY -Discussion of language as code, other approaches
01000 sentence versus word dictionary using projection
01100 rules to yield an interpretation from word definitions.
01200 experience with old Parry.
01300 PROBLEMS -dialogue problems and methods. Constraints. Special cases.
01400 Preprocessing- dict words only
01500 translations
01600 contractions
01700 expansions
01800 synonyms
01900 negation
02000 Segmenting - prepositions, wh-words, meta-verbs
02100 give list
02200 Matching - simple and compound patterns
02300 association with semantic functions
02400 first coarsening - drop fillers- give list
02500 second coarsening - drop one word at a time
02600 dangers of insertion and restoration
02700 Recycle condition- sometimes a pattern containing pronouns
02800 is matched, like "DO YOU AVOID THEM". If THEM could be
02900 a number of different things and Parry's answer depends on
03000 which one it is, then the current value of the anaphora,
03100 THEM, is substituted for THEM and the resulting pattern
03200 is looked up. Hopefully, this will produce a match to a
03300 more specific pattern, like "DO YOU AVOID MAFIA".
03400 default condition - pass surface to memory
03500 change topic or level
03600 Advantages - real-time performance, pragmatic adequacy and
03700 effectiveness, performance measures.
03800 "learning" by adding patterns
03900 PARRY1 ignored word order- penalty too great
04000 PARRY1 too sequential taking first pattern it found
04100 rather than looking at whole input and then deciding.
04200 PARRY1 had patterns strung out throughout procedures
04300 and thus cumbersome for programmer to see what patterns were.
04400 Limitations - typical failures, possible remedies
04500 Summary
04600
04700
04800 By "characterize" we are referring to a process, a
04900 multi-stage sequence of functions, which progressively transforms
05000 natural language input expressions into a pattern which eventually
05100 best matches a stored pattern whose name has a pointer to the name of
05200 a response function. Response functions decide what to do once the
05300 input has been characterized. Here we shall discuss only the
05400 characterizing functions, except for one response function (anaphoric
05500 substitution) which interactively aids the characterization process.
05600 In constructing and testing a simulation of paranoid
05700 pocesses, we were faced with the problem of reproducing paranoid
05800 linguistic behavior in a diagnostic psychiatric interview. The
05900 diagnosis of paranoid states, reactions or modes is made by
06000 clinicians who judge a degree of correspondence between what they
06100 observe linguistically in an interview and their conceptual model of
06200 paranoid behavior. There exists a high degree of agreement about this
06300 conceptual model which relies mainly on what an interviewee says and
06400 how he says it.
06500 Natural language is a life-expressing code people use for
06600 communication with themselves and others. In a real-life dialogue
06700 such as a psychiatric interview, the participants have interests,
06800 intentions, and expectations which are revealed in their linguistic
06900 expressions. To produce effects on an interviewer which he would
07000 judge similar to the effects produced by a paranoid patient , an
07100 interactive simulation of a paranoid patient must be able to
07200 demonstrate typical paranoid interview behavior. Thus is must have
07300 the ability to deal with the linguistic behavior of the interviewer
07400 adequate to achieve the desired effects.
07500 There are a number of approaches one might consider in
07600 handling dialogue expressions. One approach would be to
07700 construct a dictionary of all expressions which could possibly arise
07800 in an interview. Associated with each expression would be its
07900 interpretation depending on dialogue context. For obvious reasons,
08000 no one takes this approach seriously. Instead of an expression
08100 dictionary, one might construct a word dictionary and then use
08200 projection rules to yield an interpretation of a sentence from the
08300 dictionary definitions. This, for example, has been the approach of
08400 Winograd [ ] and Woods [ ]. Such a method performs adequately as long
08500 as the dictionary involves only a few hundred words, each word having
08600 only one or two senses, and the dialogue is limited to a mini-world
08700 of only a few objects and relations. But the problems which arise in
08800 a psychiatric interview conducted in unrestricted English are too
08900 great for this method to be useful in real-time dialogues requiring a
09000 waiting time of less than 10 seconds..
09100 Little is known about how humans process natural language.
09200 They can be shown to possess some knowledge of grammar rules but this
09300 does not entail that they use a grammar in interpreting and producing
09400 language. Irregular verb-tenses and noun-plurals do not follow rules;
09500 yet people use thousands of them. One school of linguistics believes
09600 that people possess full transformational grammars for processing
09700 language. In our view this position seems dubious. Originally
09800 transformational grammars were not designed to "understand" a large
09900 subset of English; they represented a set of axioms for deciding
10000 whether a string is "grammatical". Efforts to use them for other
10100 purposes have not been fruitful.
10200 An analysis of what the problem is guides one to the
10300 selection or invention of methods appropriate to its solution. Our
10400 problem was not to develop a consistent theory of language or to
10500 assert empirical hypotheses about how people process language. Our
10600 problem was to characterize what is being said in a dialogue and what
10700 is being said about it in order to make a response such that a sample
10800 of I-O pairs from the paranoid model is judged similar to a sample of
10900 I-O pairs from paranoid patients. We are not making an existence
11000 claim that our strategy represents the way people process language.
11100 We sought an efficacious method which could operate efficiently in
11200 real time. Its relation to methods humans use is, by way of analogy,
11300 a workable possibility, i.e. "something like this" might occur in
11400 people.
11500 For our problem, managing the communicative uses and effects
11600 of natural language, we adopted a strategy of transforming the input
11700 until a pattern is achieved which matches to some degree a stored
11800 pattern. This strategy is adequate for our purposes a satisfactory
11900 percentage of the time. (No one expects an algorithm to be
12000 successful 100% of the time since not even humans, the best natural
12100 language system around, achieve this level of performance). The
12200 power of this method for natural language dialogues lies in its
12300 ability to ignore unrecognizable expressions and irrelevant details.
12400 A conventional parser doing word-by-word analysis fails when it
12500 cannot find one or more of the input words in its dictionary. Its
12600 weakness is that it must know; it cannot guess.
12700 In early versions of the paranoid model, (PARRY1), many of
12800 the pattern recognition mechanisms were weak because they allowed the
12900 elements of the pattern to be order independent. For example,
13000 consider the following expressions:
13100 (1) WHERE DO YOU WORK?
13200 (2) WHAT SORT OF WORK DO YOU DO ?
13300 (3) WHAT IS YOUR OCCUPATION ?
13400 (4) WHAT DO YOU DO FOR A LIVING ?
13500 (5) WHERE ARE YOU EMPLOYED ?
13600 In PARRY1 a procedure would scan these expressions looking for an
13700 information-bearing contentive such as "work", "for a living", etc.
13800 If it found such a contentive along with a "you" or "your" in the
13900 expression, regardless of word order, it would respond to the
14000 expression as if it were a question about the nature of one's work.
14100 (There is some doubt this even qualifies as a pattern since
14200 interrelations between words are ignored and only their presence is
14300 considered). An insensitivity to word order has the advantage that
14400 lexical items representing different parts of speech can represent
14500 the same concept,e.g. "work" as noun or as verb. But we found from
14600 experience that, since English relies heavily on word order to convey
14700 the meaning of it messages, the mean penalty of errors, was too
14800 great. Hence in PARRY2 , as will be described in detail, all the
14900 patterns require a specified word order.
15000 It is a truism for high-complexity problems that it is useful
15100 to have constraints. Diagnostic psychiatric interviews (and
15200 especially those conducted over teletypes) have several natural
15300 constraints. First, clinicians are trained to ask certain questions
15400 in certain ways. These stereotypes can be treated as special cases.
15500 Second, only a few hundred standard topics are brought up by
15600 interviewers who are trained to use everyday expressions and
15700 especially those used by the patient himself. When the interview is
15800 conducted over teletypes, expressions tend to be shortened since the
15900 interviewer tries to increase the information transmission rate over
16000 the slow channel of a teletype. (It is said that short expressions
16100 are more grammatical but think about the phrase "Now now, there
16200 there.") Finally, teletyped interviews represent written speech.
16300 These expressions are full of idioms, cliches, pat phrases, etc. -
16400 all being easy prey for a pattern recognition approach. It is futile
16500 to try to decode an idiom by analyzing the meanings of its individual
16600 words. One knows what an idiom refers to or one does not.
16700 We shall describe the alogorithm in three sections devoted to
16800 preprocessing, segmenting, matching and recycling.
16900
17000 PREPROCESSING
17100
17200 Each word in the input expression is first looked up in a
17300 dictionary of 1240 words. The dictionary consists of a list of words
17400 and other words they can be translated into. (SHOW PIECE OF DICT?) If
17500 a word in the input is not in the dictionary, it is dropped from the
17600 pattern being formed. Thus if the input were:
17700 WHAT IS YOUR CURRENT OCCUPATION?
17800 and the word "current" is not in the dictionary, the pattern at this
17900 phase becomes:
18000 ( WHAT IS YOUR OCCUPATION )
18100 The question-mark is thrown away since questions are recognized by
18200 word order. (A statement followed by a question mark ( YOU GAMBLE? )
18300 is considered to be communicatively equivalent in its effects to that
18400 statement followed by a period.) Synonymic translations of words are
18500 made so that the pattern becomes, for example:
18600 ( WHAT BE YOU JOB )
18700 Groups of words are translated into a single word so that, for
18800 example, "for a living" becomes "job".
18900 Certain juxtaposed words are made into a single word,e.g.
19000 "GET ALONG WITH" becomes "GETALONGWITH". This is done (1) to deal
19100 with groups of words which are represented as single words in the
19200 stored pattern and (2) to prevent segmentation from occurring at the
19300 wrong places, such as at a preposition inside an idiom. Besides
19400 these contractions, certain expansions are made so that for example,
19500 "DON'T" becomes "DO NOT" and "I'D" becomes "I WOULD".
19600
19700 SEGMENTING
19800
19900 Borrowing a heuristic from machine-translation work by Wilks
20000 [ ] and supported by evidence from psycholinguistic experiments
20100 indicating that humans recognize spoken sentences a phrase at a time,
20200 we devised a way of bracketing the pattern constructed up to this
20300 point into shorter segments using the list of words in Fig.1. The
20400 new pattern formed is termed either "simple", having no delimiters
20500 within it, or "compound", i.e.being made up of two or more simple
20600 patterns. A simple pattern might be:
20700 ( WHAT BE YOU JOB )
20800 whereas a compound pattern would be:
20900 (( WHY BE YOU ) ( IN HOSPITAL ))
21000 Our experience with this method of segmentation shows that compound
21100 patterns are rarely more than three or four fragments.
21200 After certain verbs ("THINK", "FEEL",etc) a bracketing occurs
21300 to replace the commonly omitted "THAT", such that:
21400 ( I THINK YOU BE AFRAID )
21500 becomes
21600 (( I THINK ) ( YOU BE AFRAID ))
21700
21800 PREPARATION FOR MATCHING
21900
22000 Conjunctions serve only as markers for the segmenter and they
22100 are dropped out after segmentation.
22200 Negations are handled by extracting the "NOT" from the
22300 pattern and assigning a value to a global variable which indicates to
22400 the algorithm that the expression is negative in form. When a pattern
22500 is finally matched, this variable is consulted. Some patterns have a
22600 pointer to a pattern of opposite meaning if a "NOT" could reverse
22700 their meanings. If this pointer is present and a "NOT" is found,
22800 then the pattern matched is replaced by its opposite. (Roger- need good example).
22900
23000 MATCHING AND RECYCLING
23100
23200 The algorithm now attempts to match the segmented patterns
23300 with stored patterns which are currently 1024 in number. First a
23400 complete and perfect match is sought. When a match is found, the
23500 stored pattern name has a pointer to the name of a response function
23600 which decides what to do further. If a match is not found, further
23700 transformations of the pattern are carried out and a "fuzzy" match is
23800 tried.
23900 For fuzzy matching at this stage, the contentive words in the
24000 pattern are dropped one at a time and a match attempted each time.
24100 This allows ignoring familiar words in unfamiliar contexts. For
24200 example, "well" is important in "Are you well?" but meaningless in
24300 "Well are you?".
24400 Deleting one word at a time results in, for example, the pattern:
24500 (what be you main problem )
24600 becoming successively:
24700 (a) ( be you main problem )
24800 (b) ( what you main problem )
24900 (c) ( what be main problem )
25000 (d) ( what be you problem )
25100 (e) ( what be you main )
25200 Since the stored pattern, in this case, matches (d), (e) would not be
25201 constructed. We found it unwise to delete more than one word since
25202 out segmentation method yields segments containing a small (1-4)
25203 number of words.
25600 The transformations described above result in a progressive
25700 coarsening of the patterns by deletion. Substitutions are also made
25800 in certain cases. Some patterns contain pronouns which could stand
25900 for a number of different things of importaance to PARRY2. The
26000 pattern:
26100 (DO YOU AVOID THEM)
26200 could refer to the Mafia, or racetracks, or other patients. When
26300 such a pattern is recognized, the pronoun is replaced by its current
26400 anaphoric value, and a more specific pattern such as:
26500 (DO YOU AVOID MAFIA)
26600 is looked up. In many cases, the meaning of a pattern containing a
26700 pronoun is clear without any substitution. In the pattern:
26800 ((HOW DO THEY TREAT YOU) (IN HOSPITAL))
26900 the meaning of THEY is clarified by (IN HOSPITAL).
27000
27100 COMPLEX-PATTERN MATCH
27200
27300 When more than one simple pattern is detected in the input, a
27400 second matching is attempted. The methods used are similar to the
27500 first matching. Certain patterns, such as (HELLO) and (I THINK), are
27600 dropped because they are meaningless. If a complete match is not
27700 found, then simple patterns are dropped, one at a time, from the
27800 complex pattern. This allows the input,
27900 ((HOW DO YOU COME) (TO BE) (IN HOSPITAL))
28000 to match the stored pattern,
28100 ((HOW DO YOU COME) (IN HOSPITAL)).
28200
28300 If no match can be found at this point, the algorithm has
28400 arrived at a default condition and the appropriate response functions
28500 decide what to do. For example, in a default condition, the model
28600 may assume control of the interview, asking the interviewer a
28700 question, continuing with the topic under discussion or introducing a
28800 new topic.
28900
29000 ADVANTAGES AND LIMITATIONS
29100
29200 As mentioned, one of the main advantages of a
29300 characterization strategy is that it can ignore what it does NOT
29400 recognize. There are at least 415,000 words in English, each
29500 possessing one to one hundred senses. To construct a machine-usable
29600 dictionary of this magnitude is out of the question at this time. A
29700 characterization of natural language input such as described above,
29800 allows real-time interaction in a dialogue since it avoids becoming
29900 ensnarled in "understanding" and metainterpretations of language
30000 which would slow down a dialogue to impracticality, if it could even
30100 occur at all.
30200 A drawback to PARRY1 was that it reacted to the first pattern
30300 it found in the input rather than characterizing the input as fully
30400 as possible and then deciding what to do based on a number of tests.
30500 Another practical difficulty with PARRY1 from a programmer's
30600 viewpoint, was that the patterns were strung out in various
30700 procedures throughout the algorithm. It was often a considerable
30800 chore for the programmer to determine whether a given pattern was
30900 present and precisely where it was. In the above-described method,
31000 the patterns are all collected in one part of the data-base where
31100 they can easily be examined.
31200 Concentrating all the patterns in the data base gives PARRY2
31300 a limited "learning" ability. When an input fails to match any
31400 stored pattern or matches an incorrect one, as judged by a human
31500 operator, a pattern matching the input can be put into the data base
31600 automatically. If the new pattern has the same meaning as a
31700 previously stored pattern, the human operator must provide the name
31800 of the appropriate response function. If he doesn't remember the
31900 name, he may try to rephrase the input in a form recognizable to
32000 PARRY2 and it will name the response function associated with the
32100 rephrasing. These mechanisms are not "learning" in the commonly used
32200 sense but they do allow a person to transfer his knowledge into
32300 PARRY2's data base with very little redundant effort.
32400 We have a number of performance measures on PARRY1 along a
32500 number of dimensions including "linguistic non-comprehension". That
32600 is, judges estimated PARRY1's abilities along this dimension on a 0-9
32700 scale. They also rated human patients and a "random" version of
32800 PARRY1 in this manner.( GIVE BAR-GRAPH HERE AND DISCUSS). We have
32900 collected ratings of PARRY2 along this dimension to determine if the
33000 characterization process represents an improvement over PARRY1.
33100 (FRANK AND KEN EXPERIMENT).